Journals
  Publication Years
  Keywords
Search within results Open Search
Please wait a minute...
For Selected: Toggle Thumbnails
PiFlow: model driven big data pipeline framework
ZHU Xiaojie, ZHAO Zihao, DU Yi
Journal of Computer Applications    2020, 40 (6): 1638-1647.   DOI: 10.11772/j.issn.1001-9081.2019101793
Abstract471)      PDF (1594KB)(498)       Save
Big data processing with complex process mostly relies on pipeline systems. However, the pipeline systems of big data processing have some shortcomings in usability, function reusability, expansibility and processing performance. In order to solve the problems and improve the construction and development efficiency of big data processing environment and optimize the processing flow, a model driven big data pipeline framework called PiFlow was proposed. Firstly, the big data processing process was abstracted as a directed acyclic graph. Then, a series of components were developed to construct the data processing pipeline, and the pipeline task execution mechanism was designed. At the same time, in order to standardize and simplify the pipeline framework description, a model driven big data pipeline description language called PiFlowDL was designed, which described the big data processing tasks in a modular and hierarchical way. PiFlow configures the pipeline in a What You See Is What You Get (WYSIWYG) way, and integrates the functions such as status monitoring, template configuration, and component integration. Compared with Apache NiFi, it has the performance improvement of 2-7 times.
Reference | Related Articles | Metrics